985 research outputs found
From statistical evidence to evidence of causality
While statisticians and quantitative social scientists typically study the "effects of causes" (EoC), Lawyers and the Courts are more concerned with understanding the "causes of effects" (CoE). EoC can be addressed using experimental design and statistical analysis, but it is less clear how to incorporate statistical or epidemiological evidence into CoE reasoning, as might be required for a case at Law. Some form of counterfactual reasoning, such as the "potential outcomes" approach championed by Rubin, appears unavoidable, but this typically yields "answers" that are sensitive to arbitrary and untestable assumptions. We must therefore recognise that a CoE question simply might not have a well-determined answer. It is nevertheless possible to use statistical data to set bounds within which any answer must lie. With less than perfect data these bounds will themselves be uncertain, leading to a compounding of different kinds of uncertainty. Still further care is required in the presence of possible confounding factors. In addition, even identifying the relevant "counterfactual contrast" may be a matter of Policy as much as of Science. Defining the question is as non-trivial a task as finding a route towards an answer. This paper develops some technical elaborations of these philosophical points from a personalist Bayesian perspective, and illustrates them with a Bayesian analysis of a case study in child protection
β models for random hypergraphs with a given degree sequence
We introduce the beta model for random hypergraphs in order to represent
the occurrence of multi-way interactions among agents in a social network. This model
builds upon and generalizes the well-studied beta model for random graphs, which instead only considers pairwise interactions. We provide two algorithms for fitting the
model parameters, IPS (iterative proportional scaling) and fixed point algorithm, prove
that both algorithms converge if maximum likelihood estimator (MLE) exists, and provide algorithmic and geometric ways of dealing the issue of MLE existence
Differentially Private Model Selection with Penalized and Constrained Likelihood
In statistical disclosure control, the goal of data analysis is twofold: The
released information must provide accurate and useful statistics about the
underlying population of interest, while minimizing the potential for an
individual record to be identified. In recent years, the notion of differential
privacy has received much attention in theoretical computer science, machine
learning, and statistics. It provides a rigorous and strong notion of
protection for individuals' sensitive information. A fundamental question is
how to incorporate differential privacy into traditional statistical inference
procedures. In this paper we study model selection in multivariate linear
regression under the constraint of differential privacy. We show that model
selection procedures based on penalized least squares or likelihood can be made
differentially private by a combination of regularization and randomization,
and propose two algorithms to do so. We show that our private procedures are
consistent under essentially the same conditions as the corresponding
non-private procedures. We also find that under differential privacy, the
procedure becomes more sensitive to the tuning parameters. We illustrate and
evaluate our method using simulation studies and two real data examples
Sharing Social Network Data: Differentially Private Estimation of Exponential-Family Random Graph Models
Motivated by a real-life problem of sharing social network data that contain
sensitive personal information, we propose a novel approach to release and
analyze synthetic graphs in order to protect privacy of individual
relationships captured by the social network while maintaining the validity of
statistical results. A case study using a version of the Enron e-mail corpus
dataset demonstrates the application and usefulness of the proposed techniques
in solving the challenging problem of maintaining privacy \emph{and} supporting
open access to network data to ensure reproducibility of existing studies and
discovering new scientific insights that can be obtained by analyzing such
data. We use a simple yet effective randomized response mechanism to generate
synthetic networks under -edge differential privacy, and then use
likelihood based inference for missing data and Markov chain Monte Carlo
techniques to fit exponential-family random graph models to the generated
synthetic networks.Comment: Updated, 39 page
Social Indicators 1973: Statistical Considerations
1 online resource (PDF, 29 pages
Coded Parity Packet Transmission Method for Two Group Resource Allocation
Gap value control is investigated when the number of source and parity packets
is adjusted in a concatenated coding scheme whilst keeping the overall coding
rate fixed. Packet-based outer codes which are generated from bit-wise XOR
combinations of the source packets are used to adjust the number of both source
packets. Having the source packets, the number of parity packets, which are the
bit-wise XOR combinations of the source packets can be adjusted such that the
gap value, which measures the gap between the theoretical and the required
signal-to-noise ratio (SNR), is controlled without changing the actual coding
rate. Consequently, the required SNR reduces, yielding a lower required energy
to realize the transmission data rate. Integrating this coding technique with
a two-group resource allocation scheme renders efficient utilization of the total
energy to further improve the data rates. With a relatively small-sized set of
discrete data rates, the system throughput achieved by the proposed two-group
loading scheme is observed to be approximately equal to that of the existing
loading scheme, which is operated with a much larger set of discrete data rates.
The gain obtained by the proposed scheme over the existing equal rate and
equal energy loading scheme is approximately 5 dB. Furthermore, a successive
interference cancellation scheme is also integrated with this coding technique,
which can be used to decode and provide consecutive symbols for inter-symbol
interference (ISI) and multiple access interference (MAI) mitigation. With this
integrated scheme, the computational complexity is signi cantly reduced by
eliminating matrix inversions. In the same manner, the proposed coding scheme
is also incorporated into a novel fixed energy loading, which distributes packets
over parallel channels, to control the gap value of the data rates although the
SNR of each code channel varies from each other
- …